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Introduction 

In  this  proposal,  we  aimed  to  a)  systematically  monitor  splicing  variant  profiles  in  breast 
cancer  susceptibility  genes  and  b)  explore  the  role  of  alternative  splicing  in  breast 
chemotherapy  using  a  global  strategy.  In  doing  so,  we  hoped  to  identify  and  validate 
candidate  splicing  variants  involved  in  tumorigenesis  using  deep  sequencing  and 
functional  assays.  This  annual  report  summarizes  our  progress  in  the  final  year  of  this 
award  (we  were  granted  a  no-cost  extension). 

Body: 

The  aims  of  our  original  proposal  were  as  follows: 

A.  Systematically  monitor  splicing  variant  profiles  in  breast  cancer  susceptibility  genes. 

B.  Explore  the  role  of  alternative  splicing  in  breast  chemotherapy  using  a  global 
srategy. 

This  work  was  done  collaboratively  with  Drs.  William  Foulkes  and  Jun  Zhu. 

Although  the  originally  proposed  bar-code  and  chemosensitivity  approach  was  not 
feasible  after  the  departure  of  the  original  PI,  Dr.  Jun  Zhu,  we  were  able  to  develop  a 
complementary  and  powerful  approach  using  allele-specific  RNA-sequencing  to  pursue 
the  same  goals. 

Broadly,  we  pursued  two  complementary  approaches  to  identify  a  role  for  alternative 
splicing  in  the  disease.  First,  we  have  successfully  developed  allele-specific  RNA 
sequencing  as  a  viable  method  for  the  identification  of  global  profiles  of  alternative 
splicing  in  breast  cancers  and,  potentially,  other  malignancies.  Second,  we  have  further 
developed  the  experimental  methods  needed  to  functionally  validate  the  effects  of 
observed  splicing  variants. 

These  approaches  have  generated  a  wealth  of  data  that  we  describe  below. 

Sample  selection: 

Parallel  to  the  efforts  of  our  co-PI,  we  chose  the  following  cases  of  breast  cancer  cell 
lines:  HCC1937,  SUM149PT,  HCC3153,  SUM1315,  MBC647,  MPC7105,  MPC298, 
MPC600,  MPC960,  LY3,  LY10,  TMD8,  BJAB  and  LY19  for  our  sequencing  work.  These 
were  compared  to  14  sequenced  lymphoid  cells  as  controls. 
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Identifying  alternative  splicing  events  through  allele-specific  RNA  sequencing 

RNA-sequencing  has  emerged  as  a 
powerful  method  for  identifying  gene 
expression  and  alternative  splicing 
events^.  We  have  designed  a  workflow  to 
identify  the  alternative  splicing  events  in 
breast  cancers.  This  method  has  the 
additional  advantage  of  being  able  to 
identify  genetic  variants  and  gene 
rearrangements. 

We  have  aligned  RNA  sequencing  data 
from  an  these  cases  to  the  genome  using 
Tophat-.  After  aligning  the  reads  to  the 
genome  using  Tophat,  we  run  SAMtools 
mpileup"^  to  call  variants  on  all  aligned 
reads.  These  variants  will  include  those 
involved  in  alternative  splicing  events. 

We  use  the  program  deFuse^  to  search 
through  all  unaligned  reads  and  look  for 
those  which  span  a  gene  fusion  or  gene 
rearrangement.  Integrative  Genomics  Viewer  was  used  to  visualize  the  reads  aligned  to 
the  genome®. 

On  average,  we  found  that  70%  of  the  paired-end  reads  properly  pair  to  the  genome, 
which  is  consistent  with  recent  transcriptome  sequencing  studies^  ®’^.  These 
sequencing  reads  map  largely  to  annotated  exonic  regions  and  show  evidence  of 
alternative  splicing  (Figure  2).  In  all,  we  identified  248  such  events  that  affected 
subgroups  within  our  cases. 


Run  SAMtools  mpileup 

to  call  variants 

/ 

1 

Identify  alternative 

\ 

spiking  events 

Figure  1 :  Workflow  for  RNA  sequencing 
analysis 


C - N 

Align  reads  with  Tophat 
V _ I _ ^ _ / 


1 

1  r 


Extract  properly  paired  reads 

Extract  unmapped  reads 

V  j 

i  i 

Use  Cufflinks  to  measure 
transcript  expression 

f  \ 

Identify  gene  rearrangements 

and  gene  fusions 

V _ _ _ J 

Figure  2:  Reads 
aligned  to  PRDM1 
using  Tophat  and 
Bowtie  lie 
primarily  in  exons. 
Reads  that  span 
two  exons  are 
connected  by  a 
line  across  the 
intron  that 
indicates  splicing. 


3 


We  also  examined  the  strand-specificity  of  the  sequencing  reads  and  found  strong 
evidence  of  strand-specificity.  An  example  is  shown  in  Figure  3  on  the  genomic  region 
that  encodes  the  CD97  and  DDX39  genes  on  overlapping  loci  on  opposite  DNA  strands. 


4H— — ^ — '  il  '■ 

CD97 


4 — I  nil  i^^im  III — ^ 

DDX39 


Figure  3:  RNA-sequencing  is 
allele  (strand)-specific.  CD97 
and  DDX39  are  transcribed 
on  opposite  strands  in  close 
proximity  to  one  another. 
Splicing  is  indicated  in  red  on 
the  positive  strand  and  blue 
on  the  negative  strand. 


Identifying  genetic  variants  in  RNA  sequencing  data 


Sequencing  reads  were  processed  according  to  the  workflow  described  above.  All 
variants  that  were  called  by  SAMtools  mpileup  were  filtered  so  that  only  those  with  a 
quality  score  of  30  or  better  (i.e.  error  of  less  than  1/1000)  and  a  number  of  supporting 
150,171.670  150,171,675  roods  (groator  than  5)  remained. 


<tA  -A  TTG  C 

GIMAP8 

Figure  4:  Variant  in  the 
gene  GIMAP8  that  is  not 
present  in  the  reference 
genome. 


The  genetic  variants  were  annotated  by  genomic  location 
using  publicly  available  software  SVA^°.  Variants  occurring 
in  known  gene-coding  regions  will  be  further  classified  as 
synonymous,  missense,  nonsense,  frameshift  and  splice 
side  mutations.  Separately,  the  total  number  of  transitions 
and  transversions  were  calculated  for  each  sample.  The 
deFuse^  algorithm  is  being  applied  to  discover  all  gene 
fusions  and  gene  rearrangements  in  the  transcriptome. 

We  estimate  the  rate  of  somatic  mutations  at  ~1-2  x  10'® 
and  -94%  of  these  SNPs  are  already  present  in  the  dbSNP 
database^.  An  example  of  variant  analysis  is  shown  in 
figure  4,  where  we  depict  a  missense  variant  in  the  GIMAP8 
gene,  which  has  been  shown  to  be  dysregulated  in  lung 
cancer.  Its  role  in  breast  cancer  remains  to  be  defined. 


Real-time  polymerase  chain  reaction  (RT-PCR)  to  measure  validate  deep 
sequencing  data 


In  order  to  verify  that  our  methods  applying  high  throughput  sequencing  generated  valid 
results,  we  applied  real-time  polymerase  chain  reaction  (RT-PCR)  in  the  same  cases  to 
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assess  exon-level  expression.  1  |jg  of  RNA  was  reverse-transcribed  with  the  ABI 
(Carlsbad,  CA)  High  Capacity  cDNA  Reverse  Transcription  kit.  Gene  expression  was 
measured  with  exon-spanning  Taqman  probes,  and  normalized  to  beta-2  microglobulin 
expression. 

The  primers  used  are  shown  in  the  table  below. 


Primer  ID 

Exon 

Numb 

er 

Total 

Number 

of 

Exons 

EnsembI 
Transcript  ID 

Sequence 

LM02  5b 

F 

5 

6 

ENST0000025 

7818 

GGAACCAGTGGATGAGGTGC 

TGCA 

LM02  5b 

R 

5 

6 

ENST0000025 

7818 

TCAGGCAGTCCTCGTGCCAG 

TACTG 

LM02  6b 

F 

6 

6 

ENST0000025 

7818 

CCCCCTTCCCAAGGCCTTAA 

GCTTTG 

LM02  6b 

R 

6 

6 

ENST0000025 

7818 

ACTCCTCCCCTCAAAATGAAG 

GTGTCT 

STFB  4b 

F 

4 

11 

ENST0000051 

9937 

AAGTTCCTGGAGCAGGAGTG 

CAA 

STFB  4b 

R 

4 

11 

ENST0000051 

9937 

AGTAGTCGATGACCAGGGGG 

AAGTAGT 

STFB  8b 

F 

8 

11 

ENST0000051 

9937 

TGCCACCTCTGCATGTCCGT 

GA 

STFB  8b 

R 

8 

11 

ENST0000051 

9937 

TTTCCCTGTCCAGCCAGGAG 

CCAA 

LYPD6B  4 
b_F 

4 

7 

ENST0000040 

9642 

CGCCCAGCACACAAGGTCAG 

CAT 

LYPD6B  4 
b_R 

4 

7 

ENST0000040 

9642 

GGTCGAGAGGAGGCCTCACA 

TTATAGA 

LYPD6B  6 
b_F 

6 

7 

ENST0000040 

9643 

ACTTCACCAGCCACGGAAGA 

AGCA 

LYPD6B  6 
b_R 

6 

7 

ENST0000040 

9644 

TGTTCAGAATCTCGGCTGTG 

GTGGC 

We  further  validated  the  accuracy  of  exon  array  measurement  by  performing 
quantitative  RT-PCR  to  measure  the  exon-level  expression  of  selected  spliced  genes 
(LM02,  SFTPB,  LYPD6B)  in  the  same  samples  (Figure  5). 


Figure  5:  Validation  of  selected  splicing  events. 

A  The  average  expression  levels  of  each  exon  in  the  LM02  gene  are  depicted  for  different  cases. 
Vertical  gray  bars  indicate  probesets  mapped  to  the  exon  that  were  selected  for  validation  by 
quantitative  PCR.  The  blue  connecting  lines  map  the  exons  to  EnsembI  genes  and  the  longest 
EnsembI  transcript  is  shown  to  elucidate  exon  structure  on  the  gene.  The  graph  was  generated 
using  GenomeGraphV 

B  Quantitative  real-time  PCR  to  measure  the  expression  of  exons  5  and  6  of  the  LM02  gene. 

C  The  average  expression  levels  of  each  exon  in  the  SFTPB  gene. 

D  Quantitative  real-time  PCR  to  measure  the  expression  of  exons  4  and  8  of  the  SFTPB  gene. 

E  The  average  expression  levels  of  each  exon  in  the  LYPD6B  gene. 

F  Quantitative  real-time  PCR  to  measure  the  expression  of  exons  4  and  6  of  the  LYPD6B  gene. 
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The  identification  of  alternative  splicing  in  LM02  allows  the  identification  of  its 
functional  role  in  cancer 


In  order  to  investigate  the  role  of  alternative  splicing  in  the  biology  of  oncogenes,  we 
surveyed  known  oncogenes  genes  that  showed  highest  levels  of  differential  splicing. 
The  LM02  gene  was  selected  as  a  suitable  candidate  because  it  was  among  the  most 
highly  differentially  spliced  genes.  LM02  is  a  known  oncogene  in  the  T  cell  leukemias^ ^ 
and  is  expressed  highly  in  rapidly  proliferating  germinal  center  B  cells^"^. 


The  LM02  gene  comprises  6  exons.  Our  data  indicate  that  only  exons  5  and  6  of  this 
gene  are  expressed  at  significantly  higher  levels  in  different  cases  (Figure  6A).  Exons  5 
and  6  encode  the  LM02  LIM-binding  domains  that  mediate  protein-protein  interaction. 
The  remaining  exons  (exons  1-4)  are  non-coding,  suggesting  that  alternative  splicing  of 
exons  5  and  6  represents  the  primary  mechanism  of  LM02  regulation  in  cancers. 


We  reasoned  that  the  knowledge  of  the  particular  isoforms  expressed  in  cancers 
provided  an  opportunity  to  discover  the  potential  downstream  targets  of  LM02  and  its 
function.  We  applied  RNA-interference  directed  specifically  at  exon  6  of  the  LM02  gene, 
which  is  alternatively  spliced.  Using  lentiviral  vectors  to  deliver  two  separate  shRNA 
constructs  to  knock-down  LM02,  we  generated  cell  lines  that  stably  expressed  the 
shRNAs  and  had  consequently  lower  levels  of  LM02  mRNA  (Figure  6B)  and  protein 
expression  (Figure  6C). 


We  then  performed  gene  expression  profiling  on  the  control  cells  and  cells  with  LM02- 
knockdown.  We  identified  338  genes  that  were  down-regulated  at  least  1 .5  fold  in  both 
RNA-interference  experiments  (Figure  6D).  These  genes  are  listed  in  Supplement  Table 
4.  We  performed  gene  set  enrichment  analysis^^  and  found  that  genes  related  to 
proliferation^^  were  highly  associated  with  LM02  knockdown  (Figure  4E,  P<0.01, 
FDR<0.1). 

We  also  examined  the  effect  of  LM02-knockdown  on  cellular  proliferation  rates.  We 
found  that  LM02-silencing  was  significantly  associated  with  decreased  cellular 
proliferation  in  in  vitro  experiments  (Figure  6F).  LM02  silencing  did  not  alter  cellular 
viability  (not  shown)  but  significantly  decreased  the  proliferation  rate  (P<0.01).  These 
data  indicate  a  potential  role  for  LM02  in  mediating  responsiveness  to  chemotherapy  in 
tumors  expressing  the  splicing  variants  that  include  exons  5  and  6. 
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Figure  6:  Splicing  events  in  LM02  result  in 
altered  phenotypes  in  cancer. 

A  Differential  expression  of  exons  5  and  6 
distinguishes  different  cancers.  The 
remaining  exons  are  expressed  at  roughly 
equivalent  levels.  Vertical  gray  bars  indicate 
probesets  mapped  to  the  exon  that  were 
selected  for  validation  by  quantitative  PCR. 
The  blue  connecting  lines  map  the  exons  to 
EnsembI  genes  and  the  longest  EnsembI 
transcript  is  shown  to  elucidate  exon 
structure  on  the  gene.  The  graph  was 
generated  using  GenomeGraph\ 

B  Lentiviral  delivery  of  shRNAs  targeting 
the  6*^  exon  of  LM02  result  in  reduction  of 
LM02  mRNA  levels  compared  to  non¬ 
silencing  controls.  Two  different  shRNA 
constructs  were  used  in  two  separate  cell 
lines. 

C  Lentiviral  delivery  of  shRNAs  targeting 
the  6*^  exon  of  LM02  result  in  reduction  of 
LM02  protein  levels  in  the  cell  lines 
compared  to  ACTIN  controls. 

D  Genes  that  were  reduced  at  least  1 .5- 
fold  in  response  to  stable  knockdown  of 
LM02  gene  in  the  cell  lines  are  depicted. 
Down-regulated  gene  expression  is  shown 
in  green  over  a  2-fold  range.  Selected 
genes  are  labeled  in  this  heat-map. 

E  Gene  set  enrichment  analysis 
demonstrates  that  proliferation  is  a  key 
cellular  process  that  is  highly  altered  in 
response  to  LM02-knockdown. 

F  LM02-knockdown  results  in  significant 
decreased  cellular  proliferation  (P<0.01). 
Cellular  viability  was  not  significantly  altered 
by  LM02-knockdown.  Cell  proliferation  and 
viability  of  cells  expressing  the  indicated 
shRNAs  were  determined  by  trypan  blue 
exclusion. 


These  data  point  to  a  significant  role  for  LM02  in  regulating  cellular  proliferation  in 
cancer. 


Summary  of  all  work  done  during  the  granting  period 

Thus,  although  the  originally  proposed  molecular  barcode  strategy  could  not  be 
implemented  after  the  departure  of  Dr.  Jun  Zhu  from  Duke,  we  have  made  significant 
findings  in  two  major  areas  that  are  consistent  with  the  original  goals  of  the  application: 

1)  We  have  developed  allele-specific  RNA-sequencing  to  identify  alternative  splicing 
events. 

2)  We  have  identified  novel  alternative  splicing  events  in  LM02  and  identified  its 
downstream  effects  through  RNA-interference  experiments. 
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Key  Research  Accomplishments 

•  Development  of  allele-specific  RNA  sequencing  as  a  novel  approaches  to  analyze 
RNA  Seq  data 

•  Identification  of  novel  alternative  splicing  events  in  breast  cancers. 

•  Identification  of  alternative  splicing  as  a  novel  mechanism  for  regulation  of  the 
oncogene  LM02. 

Reportable  Outcomes 

Posters  and  Presentations 

E.  Lalonde  Characterizating  the  BRCA1 -deficient  breast  cancer  transcriptomes  by 
RNA-Seq  [presentation].  Era  of  Hope  Meeting,  Orlando,  Florida,  August  4th  2011 
Characterization  of  BRCA1 -deficient  breast  cancer  transcriptomes  by  RNA-Seq  reveals 
novel  transcript  isoforms  [poster],  American  Society  of  Human  Genetics  Annual 
Meeting,  Washington  D.C.,  November  2-6  2010. 

Published  papers  (work  directly  or  indirectly  supported  by  this  award) 

Lalonde  E,  Ha  KCH,  Wang  Z,  Bemmo  A,  Kleinman  C,  Kwan  T,  Pastinen  T,  and 
Majewski  J  (2011).  RNA  sequencing  reveals  the  role  of  splicing  polymorphisms  in 
regulating  human  gene  expression.  Genome  Research,  21,  545-554. 

Jima,  D.D.  et  al.  Deep  sequencing  of  the  small  RNA  transcriptome  of  normal  and 
malignant  human  B  cells  identifies  hundreds  of  novel  microRNAs.  Blood  116,  el  18-27 
(2010).  (while  this  project  is  not  solely  related  to  breast  cancer,  the  techniques  were 
acquired  during  the  3  years  of  the  DOD  award) 

Shuen  AY,  Foulkes  WD.  Inherited  mutations  in  breast  cancer  genes-risk  and  response. 
J  Mammary  Gland  Biol  Neoplasia.  2011  Apr;16(1):3-15. 

Martinez-Marignac  VL,  Rodrigue  A,  Davidson  D,  Couillard  M,  Al-Moustafa  AE, 
Abramovitz  M,  Foulkes  WD,  Masson  JY,  Aloyz  R.  The  effect  of  a  DNA  repair  gene  on 
cellular  invasiveness:  XRCC3  overexpression  in  breast  cancer  cells.  PLoS  One.  201 1 
Jan  24;6(1):e16394. 

Conclusion 

In  this  proposal,  we  have  set  out  to  evaluate  the  importance  of  splicing  for  breast  cancer 
biology.  Thus  our  work  provides  a  starting  point  for  the  global  identification  of  oncogenic 
events  and  the  experimental  methodologies  for  the  functional  characterization  of  such 
events.  This  work  has  produced  a  tremendous  amount  of  data  that  will  inform  these 
questions  for  the  next  several  years.  We  hope  that  our  approach  will  lead  to  significant 
insights  into  the  more  general  question  of  the  importance  of  alternative  splicing  in  breast 
cancer  biology. 
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